-095 


Training  Research  Laboratory 


DapoMmant  of  Psychology  Buroou  of  Educational  Rosoorch 

Univarsity  of  Illinois  6  Lincoln  Holl  U'bono,  Illinois 


CONTEXTUAL  PREDICTABILITY  AND  FREQUENCY  FACTORS 

Domenico  Pari si 
Ulderlco  Cappelli 
Lawrence  M.  Stolurow 


Technical  Report  No.  41 
August,  1966 


Communication,  Cooperation,  and  Negotiation 
in  Culturally  Heterogeneous  Groups 

Project  Supported  by  the 

Advanced  Research  Projects  Agency,,  ARPA  Order  No.  454 
Under  Office  of  Naval  Research  Contract  NR  177-472,  Nonr  1834(36) 

Fred  E.  Fiedler,  Lawrence  M.  Stolurow,  *nd  Harry  C.  Triandis 
Principal  Investigators 


DISTRIBUTION  OF  THIS 
DOCUMENT  IS  UNLIMITED 


* 


CONTEXTUAL  PREDICTABILITY  AND  FREQUENCE  FACTORS 


Domenico  Parisl 
Ulderico  Cappelli 
Lawrence  M.  Stolurow 


Technical  Report  No.  41 
August,  1966 


Communication,  Cooperation,  and  Negotiation 
in  Culturally  Heterogeneous  Groups 

Project  Supported  by  the 

Advanced  Research  Projects  Agency,  ARPA  Order  No.  454 
Under  Office  of  Naval  Research  Contract  NR  17"*-472,  Nonr  1834(36) 

Fred  E.  Fiedler,  Lawrence  M.  Stolurow,  and  Harry  C.  Trlandls 
Principal  Investigators 


DISTRIBUTION  OF  THIS 
DOCUMENT  IS  UNLIMITED 


Contextual  Predictability  and  Frequency  Factors 

Domenico  Paris!,  Ulderico  Cappelli, 
and  Lawrence  M.  Stolurow 

Abstract 

Cloze  scores  were  obtained  from  320  Ss  ior  two  written  Italian 
passages  totaling  616  words  in  such  a  way  that  each  word  was  guessed 
by  32  Ss.  Each  word  was  classified  into  one  of  12  grammatical  classes. 
As  has  been  found  for  English,  content  "?ords  are  less  predictable 
than  function  words  if  guessing  the  specific  missing  item  is  required. 
No  such  difference  exists  when  only  correct  form  class  has  to  be  pre¬ 
dicted.  Type-token  ratio  for  each  class  appears  to  be  correlated 
with  specific  item  predictability,  wheras  proportion  of  occurrences 
of  each  form  class  in  the  language  is  corre  .ated  with  form  class 
predictability.  Both  correlations  sutgest  that  frequency  properties 
may  be  an  important  factor  even  in  complex  language  behavior. 
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In  recent  years  the  development  of  psycholinguistics  has  fostered 
much  interest  in  the  long  neglected  grammatical  aspects  of  language 
behavior.  The  study  of  the  relationships  among  elements  in  linguistic 
sequences  has  been  approached  by  a  variety  of  techniques,  mainly 
derived  from  two  different  and,  to  a  large  extent,  opposed  sources: 
information  theory  and  linguists'  descriptions  of  syntactical  structure. 
Among  the  techniques  inspired  by  information  theory,  statistical  approx¬ 
imations  to  English  have  yielded  a  number  of  interesting  results.  How¬ 
ever,  the  statistical  approach  is  seriously  limited  by  studying  the 
effect  of  preceding  context  on  subsequent  behavior  and  ignoring  the 
influence  of  succeeding  context.  Both  particular  studies  (Goldman- 
Eisler,  1958;  Lieberman,  1963)  and  general  observations  (Osgood  and 
Sebeok,  1954)  suggest  that  each  linguistic  segment  is  a  function  of 
both  what  precedes  and  follows  it. 

The  global  effect  of  bi-directional  context  can  be  effectively 
assessed  by  the  Cloze  technique  developed  by  Taylor  (1953).  A 
number  of  words  are  canceled  from  a  text  and  subjects  are  asked  to 
reconstitute  it  by  guessing  the  missing  words.  At  least  two  dimensions 
of  linguistic  behavior  can  be  studied  by  this  approach:  (1)  predicta¬ 
bility  of  a  specific  item  and  (2)  predictability  of  the  grammatical 
class  to  which  the  correct  item  belongs.  Dependences  among  words 
are  responsible  for  both  lexical  and  grammatical  predictability, 
but  the  two  dimensions  are  partly  uncorrelated  and  probably  re¬ 
flect  the  effects  of  at  least  two  partly  different  determinants. 

^Wb  gratefully  acknowledge  the  help  recieved  from  Professor  F. 

Agard,  Department  of  Linguistics,  Cornell  University,  who  prepared  a 
multi-level  structural  classification  of  the  616  words  from  which 
we  selected  our  12-class  subdivision. 
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The  present  study  has  a  twofold  purpose.  By  using  the  Cloze 
procedure  with  two  samples  of  Italian  written  language,  both  lexical 
and  grammatical  predictability  of  different  form  classes  of  Italian 
words  will  be  determined  and  compared  with  data  from  different 
languages.  In  addition,  Fillenbaum,  et  al,,  (1963)  have  found  that 
in  English  semantic  form  classes  (nouns,  adjectives,  and  verbs)  are 
more  difficult  to  reconstitute  than  syntactic  form  classes  (articles, 
auxiliary  verbs,  prepositions,  and  conjunctions)  if  scores  bftfcd 
upon  verbatim  reproduction  are  considered,  but  this  difference  dis¬ 
appears  when  only  grammatical  predictability  is  concerned,  We  want 
to  see  if  the  same  happens  in  Italian  and,  furthermore,  if  the  re¬ 
lationship  is  influenced  by  varying  text  difficulty. 

A  secoud  purpose  of  this  Study  is  to  look  for  determinants  of 
the  two  types  of  predictability.  Contextual  effects  can  be  interpret¬ 
ed  as  due  to  long  range  language  learning.  A  subject  is  able  to  pre¬ 
dict  the  right  word  or  the  right  form  class  in  a  particular  place 
because  of  his  long  experience  with  language.  Frequency  has  been  found 
to  be  a  powerful  variable  in  rote  verbal  learning  (Underwood  and 
Schulz,  1960).  However,  the  question  may  be  asked  of  what  effects 
of  frequency  will  be  when  a  radically  different  type  of  verbal 
behavior  is  considered.  The  most  direct  approach  in  assessing 
the  relationship  between  frequency  and  contextual  predictability  is 
to  use  a  frequency  list  of  words  such  as  Thorndike  and  Lorge  have  put 
together  for  English  (1959).  Since  no  such  list  is  available  for 
Italian,  a  different  approach  was  followed  which  would  allow  the 
extraction  of  some  measures  of  frequency  of  use  from  smaller  samples 
of  language . 


Method 

Materials 

Two  Italian  prose  passages  (Text  A  and  Text  B)  of  301  and  315 
words,  respectively,  were  used  as  materials  for  the  Cloze  procedure. 
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Text  A  was  drawn  from  a  daily  paper  and  is  a  report  of  a  road  accident. 
Text  B  is  an  excerpt  from  a  novel  by  V.  Brancati.  In  order  to  get 
Cloze  scores  for  each  word  in  both  texts,  five  versions  of  each  text 
were  prepared.  Version  1  had  the  1st,  6th,  11th,  etc,  word  deleted* 
version  2  had  the  2nd,  7th,  12th,  etc.  word  deleted,  and  so  on. 
Subjects 

320  students  of  17  to  22  years  of  age  were  used  as  subjects. 

About  one-half  were  male  and  one-half  female .  One-third  of  the 
sample  were  students  in  the  last  year  of  high  school,  and  the  remaining 
two-thirds  were  college  students. 

Procedure 

Ss  were  randomly  given  one  of  the  two  mutilated  texts  with 
instructions  to  fill  in  all  the  blanks  with  the  words  they  thought 
most  likely  to  appear  in  the  intact  text.  Each  S  had  one  of  the 
five  versions  of  either  Text  A  or  Text  B.  Therefore,  each  of  the 
616  words  was  guessed  by  32  Ss.  Time  for  completing  the  work  was 
unlimited,  but  Ss  were  told  in  the  instructions  that  they  should 
finish  in  about  10  or  IS  minutes. 

Results 

For  each  word  in  the  two  passages  a  verbatim  (V)  score  and 
a  form  class  (FC)  score  were  computed.  V  score  was  percentage 
of  Ss  filling  in  the  blank  with  a  word  either  identical  to  the  missing 
word  or  just  clearly  misspelled.  A  FC  score  was  the  percentage 
o 2  Ss  giving  a  word  which  was  in  the  same  grammatical  class  as 
the  correct  word.  Mean  V  score  was  67  per  cent  for  Text  A  and 
54  per  cent  for  Text  B.  This  difference  was  taken  as  a  difference 
in  text  difficulty  (Taylor,  1953). 

Each  of  the  616  words  of  the  two  texts  was  classified  into 
one  of  12  grammatical  classes:  nouns  (N) ,  qualifying  adjectives 
(ADJ),  verbs  (V),  adverbs  (ADV),  quantitative  adjectives  (Q), 
articles  (AR) ,  prepositions  (PRE) ,  conjunctions  (C),  auxiliary 
verbs  (AV),  pronouns  (P),  other  adjectives  (OA),  and  non-classif ?ed 
(KC) .  Table  1  shows  number  of  items  and  V  and  FC  scores  for  each 


grammatical  class,  both  for  each  text  separately  and  for  both 
texts  together.  Also  shown  are  V  and  FC  3cores  for  content  words 
(nouns,  qualifying  adjectives,  verbs,  adverbs,  quantitative 
adjectives)  and  for  function  words  (the  remaining  ones).  If 
guessing  of  specific  items  is  required,  content  words  are  i.ore 
difficult  to  reconstitute  than  function  ones.  If  oiily  form  class 
is  considered,  the  difference  disappears.  Both  results  are  in 
agreement  with  findings  reported  for  English  by  Fillenbaum  at  al. 
(1963).  Furthermore,  the  difference  in  specific  item  difficulty 
between  content  and  function  words  appears  to  increase  with  text 
difficulty,  as  it  obviously  should,  since  text  difficulty  depends 
much  more  on  content  word  difficulty  than  on  function  word  diffi¬ 
culty.  On  the  other  hand,  if  FC  scores  are  considered,  the  two 
texts  do  not  differ  very  much  in  either  overall  difficulty  01* 
differential  difficulty  of  content  and  functional  items, 

Fillenbaum  et  el.  (1963;  see  also  Ervin-Trfpp,  and  Slobin, 

1966)  attributed  the  differential  predictability  of  various  gram¬ 
matical  classes  to  class  size,  that  is,  to  the  number  of  items 
Included  in  each  class.  To  verify  this  hypothesis,  the  rank  order 
correlation  coefficient  between  verbatim  predictability  and  number 
of  different  items  in  each  class  in  our  616  word  sample  was  cal¬ 
culated.  This  coefficient  is  -.21,  which  is  well-below  significance. 
The  conclusion  that  predictability  of  specific  words  in  a  context 
is  determined  not  by  the  size  of  the  class  they  belong  to,  but  by 
the  type-token  ratio  of  that  class,  which  is  an  index  of  frequency 
of  use,  seems,  therefore,  to  be  warranted. 

The  type-token  ratio  (TTR)  was  calculated  for  each  grammatical 
class  on  the  basis  of  both  Texts  A  and  B  as  language  sample.  That 
is,  for  each  of  the  12  classes  the  number  of  different  items 
occurring  in  both  texts  was  divided  by  the  total  number  of  items 
in  the  class.  The  rank-order  correlation  between  TTR  and  mean  V 
score  for  each  class  was  -.00,  which  is  significant  at  p  <  .01 
(Figure  1).  Furthermore,  the  number  of  occurrences  of  each  class 


Number  of  Items.  V  and  FC  3cores 
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was  correlated  with  the  mean  FC  score  of  that  class,  and  the  rank-order 
correlation  coefficient  was  +.06  which  is  significant  at  p  <  ,001 
{Figure  2) . 


Discussion 

The  results  of  the  present  study  show  that  predictability 
properties  of  written  passages  are  remarkably  homogeneous  across 
languages.  V/hen  conditions  are  similar,  as  in  this  study  and  in 
deletion  rate  five  of  Fillenbaum,  et^al.  (1963),  the  order  of 
verbatim  difficulty  of  content  classes  appears  to  be  the  same  for 
Italian  and  English:  qualifying  adjectives,  adverbs,  verbs,  nouns, 
and  quantitative  adjectives.  Differences  in  functional  classes 
may  be  due  to  discrepancies  in  grammatical  classification.  More 
generally,  in  both  Italian  and  English  (Fillenbaum,  et  al.„  1963; 

Aborn  et  al.t  1959;  Coleman  and  Blumenfeld,  1963)  predicting  that 
a  word  is  a  content  or  a  functional  item  is  about  a 3  difficult, 
but  differential  difficulty  shows  up  when  one  is  asked  to  predict 
the  specific  content  or  functional  item.  Both  in  Italia*  and 
English  content  words  are  vwice  as  difficult  as  function  words. 

Classification  in  content  or  functional  classes  is  very  broad. 

More  specific  determinants  cf  predictability  can  be  found  by 
searching  through  the  frequency  properties  of  language.  Type-token 
ratio  of  a  particular  grammatical  class  can  be  used  as  an  index  of 
the  mean  frequency  of  use  of  a  type  in  that  class.  Out  oi  100 
noun3  actually  used,  92  are  different  noun*.  Mean  frequency  for 
a  noun  type  is  1.09.  Out  of  100  articles  actually  used,  only  16 
are  different  words.  Mean  frequency  for  articles  is  6.25.  These 
frequency  properties  of  grammatical  classes  appear  to  determine 
to  a  remarkable  degree  the  mean  predictability  of  words  in  each 
class.  The  predictability  of  a  particular  word  in  a  text  Is  a  function 
of  type  frequency  of  the  grammatical  dace  to  which  It  *b«lo&gs',o 
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A  more  straightforward  relationship  between  frequency  and 
predictability  may  be  seen  in  the  form  class  data.  Here,  guessing 
the  right  grammatical  class  in  a  particular  context  appears  to 
be  a  function  of  frequency  of  use  of  that  grammatical  class  in 
the  language. 

Apart  from  the  obvious  limitations  of  the  present  study,  a 
general  conclusion  can  be  drawn  regarding  the  determinants  of  complex 
linguistic  behavior.  Guessing  the  right  word  or  the  right  grammatical 
class  in  a  context  seems  to  be  a  complex  task  in  which  all  sorts 
of  sequential,  both  syntactic  and  semantic,  cues  should  influence 
performance,  ,Tt  is  because  these  more  complex  determinants  of 
behavior  are  absent  in  most  experiments  involving  verbal  material 
that  frequency  may  emerge  as  an  important  factor  in  simpler  tasks 
such  as  rote  verbal  learning  tasks,  word  recognition  tasks,  and  so 
on.  The  present  data,  however,  seem  to  show  that  frequency  may  be 
an  important  factor  even  in  a  linguistically  very  sophisticated 
task  such  predicting  words  in  a  context.  More  specifically,  it 
could  require  an  extension  of  the  "spew  '  hypothesis  put  forward  by 
Underwood  and  Schulz  (1960),  in  which  frequency  of  experience 
with  a  particular  verbal  unit  determines  its  availability,  to 
complex  verbal  behavior. 
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